Skip to content

Add defence for DeepCompile w/o optimizer #7225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

HollowMan6
Copy link
Contributor

Similar to #7211

When the optimizer is not specified, the optimizer will be type DeepSpeedZeRoOffload instead of DeepSpeedZeroOptimizer_Stage3 (e.g. for ZeRO-3 pure inference), while DeepSpeedZeRoOffload doesn't have parameter_offload.

if isinstance(optimizer, DummyOptim):
log_dist("Creating ZeRO Offload", ranks=[0])
zero_param_parallel_group = groups._get_zero_param_intra_parallel_group()
if self.zero_hpz_partition_size() > 1 and zero_param_parallel_group is None:
self._set_zero_group_parallelism()
zero_param_parallel_group = groups._get_zero_param_intra_parallel_group()
optimizer = DeepSpeedZeRoOffload(
self.module,
timers=timers,
ds_config=self.config,
overlap_comm=self.zero_overlap_comm(),
prefetch_bucket_size=self.zero_prefetch_bucket_size(),
max_reuse_distance=self.zero_max_reuse_distance(),
max_live_parameters=self.zero_max_live_parameters(),
param_persistence_threshold=self.zero_param_persistence_threshold(),
model_persistence_threshold=self.zero_model_persistence_threshold(),
offload_param_config=self.zero_offload_param(),
mpu=self.mpu,
zero_param_parallel_group=zero_param_parallel_group,
zero_quantized_weights=self.zero_quantized_weights(),
zero_quantized_nontrainable_weights=self.zero_quantized_nontrainable_weights(),
zero_module_granularity_threshold=self.zero_module_granularity_threshold(),
log_trace_cache_warnings=self.zero_log_trace_cache_warnings(),
)

  File "deepspeed/runtime/engine.py", line 3919, in compile
    backend = init_z3(self, backend, compile_config, compile_kwargs, schedule)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "deepspeed/compile/init_z3.py", line 36, in init_z3
    optimizer.parameter_offload._remove_module_hooks()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload'

When the optimizer is not specified, the optimizer will be type
`DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3`
(e.g. for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload`
doesn't have `parameter_offload`.

https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707

```log
  File "deepspeed/runtime/engine.py", line 3919, in compile
    backend = init_z3(self, backend, compile_config, compile_kwargs, schedule)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "deepspeed/compile/init_z3.py", line 36, in init_z3
    optimizer.parameter_offload._remove_module_hooks()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload'
```

Signed-off-by: Hollow Man <[email protected]>
Copy link
Contributor

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HollowMan6 Great catch, we appreciate your contribution!

@tohtana tohtana enabled auto-merge April 17, 2025 18:54
@tohtana tohtana disabled auto-merge April 17, 2025 18:56
Signed-off-by: Logan Adams <[email protected]>
@loadams
Copy link
Collaborator

loadams commented Apr 17, 2025

Thanks @HollowMan6 - we will prioritize merging this so we can push out a patch release for better DeepCompile support.

@tohtana tohtana enabled auto-merge April 17, 2025 19:26
@HollowMan6
Copy link
Contributor Author

HollowMan6 commented Apr 17, 2025

Thanks for the quick review! After this PR, together with #7226, #7224 are merged, it should build (compile) fine for AMD machines as well. But I encountered 2 separate issues when I used DeepCompile together with OpenRLHF. Will open separate issues explaining the problems I got. Opened at #7229 and #7228

@tohtana tohtana added this pull request to the merge queue Apr 17, 2025
Merged via the queue into deepspeedai:master with commit 86e51e6 Apr 17, 2025
11 checks passed
@HollowMan6 HollowMan6 deleted the defend-dc branch April 17, 2025 22:26
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this pull request May 21, 2025
Similar to deepspeedai#7211

When the optimizer is not specified, the optimizer will be type
`DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3` (e.g.
for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload` doesn't have
`parameter_offload`.

https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707

```log
  File "deepspeed/runtime/engine.py", line 3919, in compile
    backend = init_z3(self, backend, compile_config, compile_kwargs, schedule)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "deepspeed/compile/init_z3.py", line 36, in init_z3
    optimizer.parameter_offload._remove_module_hooks()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload'
```

---------

Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: yisheng <[email protected]>
deepcharm pushed a commit to deepcharm/DeepSpeed that referenced this pull request Jun 16, 2025
Similar to deepspeedai#7211

When the optimizer is not specified, the optimizer will be type
`DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3` (e.g.
for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload` doesn't have
`parameter_offload`.

https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707

```log
  File "deepspeed/runtime/engine.py", line 3919, in compile
    backend = init_z3(self, backend, compile_config, compile_kwargs, schedule)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "deepspeed/compile/init_z3.py", line 36, in init_z3
    optimizer.parameter_offload._remove_module_hooks()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload'
```

---------

Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: Max Kovalenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants